legal text
Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs
Representing the temporal evolution of legal norms is a critical challenge for automated processing. While foundational frameworks exist, they lack a formal pattern for granular, component-level versioning, hindering the deterministic point-in-time reconstruction of legal texts required by reliable AI applications. This paper proposes a structured, temporal modeling pattern grounded in the LRMoo ontology. Our approach models a norm's evolution as a diachronic chain of versioned F1 Works, distinguishing between language-agnostic Temporal Versions (TV)-each being a distinct Work-and their monolingual Language Versions (LV), modeled as F2 Expressions. The legislative amendment process is formalized through event-centric modeling, allowing changes to be traced precisely. Using the Brazilian Constitution as a case study, we demonstrate that our architecture enables the exact reconstruction of any part of a legal text as it existed on a specific date. This provides a verifiable semantic backbone for legal knowledge graphs, offering a deterministic foundation for trustworthy legal AI.
- South America > Brazil > Federal District > Brasília (0.04)
- Oceania > Palau (0.04)
- North America > United States (0.04)
- (2 more...)
- Law (1.00)
- Government > Regional Government (0.67)
ASVRI-Legal: Fine-Tuning LLMs with Retrieval Augmented Generation for Enhanced Legal Regulation
Octadion, One, Prakoso, Bondan Sapta, Setiawan, Nanang Yudi, Yudistira, Novanto
In this study, we explore the fine-tuning of Large Language Models (LLMs) to better support policymakers in their crucial work of understanding, analyzing, and crafting legal regulations. To equip the model with a deep understanding of legal texts, we curated a supervised dataset tailored to the specific needs of the legal domain. Additionally, we integrated the Retrieval-Augmented Generation (RAG) method, enabling the LLM to access and incorporate up-to-date legal knowledge from external sources. This combination of fine-tuning and RAG-based augmentation results in a tool that not only processes legal information but actively assists policymakers in interpreting regulations and drafting new ones that align with current needs. The results demonstrate that this approach can significantly enhance the effectiveness of legal research and regulation development, offering a valuable resource in the ever-evolving field of law.
- Law (1.00)
- Education > Educational Setting > K-12 Education (0.47)
LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks
Meher, Dipak, Domeniconi, Carlotta, Correa-Cabrera, Guadalupe
Abstract--Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. Human smuggling networks represent highly adaptive and organized systems involving a web of actors, routes, vehicles, and intermediaries, often operating under the radar of restrictive immigration policies [1]. These networks exploit legal loopholes, adjust swiftly to enforcement changes, and frequently intersect with transnational criminal organizations. Effectively analyzing their structure and behavior is critical for informing policy, enhancing security, and preventing exploitation. However, much of the actionable insight remains embedded in lengthy, unstructured legal documents, such as court rulings, field reports, and case transcripts, making automated analysis both essential and challenging.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Government > Immigration & Customs (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Sturgeon County (0.04)
- (6 more...)
- Law > Litigation (1.00)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- North America > United States (0.28)
- Asia > China > Liaoning Province (0.14)
- Law > Litigation (1.00)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (3 more...)
Large Language Models Meet Legal Artificial Intelligence: A Survey
Hou, Zhitian, Ye, Zihan, Zeng, Nanli, Hao, Tianyong, Zeng, Kun
Large Language Models (LLMs) have significantly advanced the development of Legal Artificial Intelligence (Legal AI) in recent years, enhancing the efficiency and accuracy of legal tasks. To advance research and applications of LLM-based approaches in legal domain, this paper provides a comprehensive review of 16 legal LLMs series and 47 LLM-based frameworks for legal tasks, and also gather 15 benchmarks and 29 datasets to evaluate different legal capabilities. Additionally, we analyse the challenges and discuss future directions for LLM-based approaches in the legal domain. We hope this paper provides a systematic introduction for beginners and encourages future research in this field. Resources are available at https://github.com/ZhitianHou/LLMs4LegalAI.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Austria > Vienna (0.14)
- (17 more...)
- Law (1.00)
- Government > Regional Government (0.93)
Optimizing Legal Document Retrieval in Vietnamese with Semi-Hard Negative Mining
Le, Van-Hoang, Nguyen, Duc-Vu, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy
Large Language Models (LLMs) face significant challenges in specialized domains like law, where precision and domain-specific knowledge are critical. This paper presents a streamlined two-stage framework consisting of Retrieval and Re-ranking to enhance legal document retrieval efficiency and accuracy. Our approach employs a fine-tuned Bi-Encoder for rapid candidate retrieval, followed by a Cross-Encoder for precise re-ranking, both optimized through strategic negative example mining. Key innovations include the introduction of the Exist@m metric to evaluate retrieval effectiveness and the use of semi-hard negatives to mitigate training bias, which significantly improved re-ranking performance. Evaluated on the SoICT Hackathon 2024 for Legal Document Retrieval, our team, 4Huiter, achieved a top-three position. While top-performing teams employed ensemble models and iterative self-training on large bge-m3 architectures, our lightweight, single-pass approach offered a competitive alternative with far fewer parameters. The framework demonstrates that optimized data processing, tailored loss functions, and balanced negative sampling are pivotal for building robust retrieval-augmented systems in legal contexts.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
- Asia > China > Hong Kong (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (3 more...)
IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders
Deshmukh, Sneha, Kamble, Prathmesh
Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.
- North America > United States (0.28)
- Europe (0.04)
- Asia > India > Uttar Pradesh (0.04)
- Asia > India > Maharashtra (0.04)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Legal Requirements Translation from Law
Singhal, Anmol, Breaux, Travis
Software systems must comply with legal regulations, which is a resource-intensive task, particularly for small organizations and startups lacking dedicated legal expertise. Extracting metadata from regulations to elicit legal requirements for software is a critical step to ensure compliance. However, it is a cumbersome task due to the length and complex nature of legal text. Although prior work has pursued automated methods for extracting structural and semantic metadata from legal text, key limitations remain: they do not consider the interplay and interrelationships among attributes associated with these metadata types, and they rely on manual labeling or heuristic-driven machine learning, which does not generalize well to new documents. In this paper, we introduce an approach based on textual entailment and in-context learning for automatically generating a canonical representation of legal text, encodable and executable as Python code. Our representation is instantiated from a manually designed Python class structure that serves as a domain-specific metamodel, capturing both structural and semantic legal metadata and their interrelationships. This design choice reduces the need for large, manually labeled datasets and enhances applicability to unseen legislation. We evaluate our approach on 13 U.S. state data breach notification laws, demonstrating that our generated representations pass approximately 89.4% of test cases and achieve a precision and recall of 82.2 and 88.7, respectively.
- North America > United States > Maryland (0.14)
- North America > United States > California (0.04)
- North America > United States > Virginia (0.04)
- (15 more...)
- Law > Statutes (1.00)
- Information Technology > Security & Privacy (1.00)
A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval
Banerjee, Puspendu, Mazumdar, Aritra, Ansar, Wazib, Goswami, Saptarsi, Chakrabarti, Amlan
The judiciary, as one of democracy's three pillars, is dealing with a rising amount of legal issues, needing careful use of judicial resources. This research presents a complex framework that leverages Data Science methodologies, notably Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques, to improve the efficiency of analyzing Calcutta High Court verdicts. Our framework focuses on two key aspects: first, the creation of a robust summarization mechanism that distills complex legal texts into concise and coherent summaries; and second, the development of an intelligent system for retrieving similar cases, which will assist legal professionals in research and decision making. By fine-tuning the Pegasus model using case head note summaries, we achieve significant improvements in the summarization of legal cases. Our two-step summarizing technique preserves crucial legal contexts, allowing for the production of a comprehensive vector database for RAG. The RAG-powered framework efficiently retrieves similar cases in response to user queries, offering thorough overviews and summaries. This technique not only improves legal research efficiency, but it also helps legal professionals and students easily acquire and grasp key legal information, benefiting the overall legal scenario.